Some practice
Let’s practice some usefull commands
Exercise 1
Let’s practice dividing and joining data.
Import CSV
Importing the CSV database under the name of Q1_B, with “,” as separator, and with “.” as decimal:
Create a subset by row
Checking the identification variable of this dataset (FSN)
The median is 52032
Let’s divide the data in two parts taking as cut point the half of the rows (The median of FSN)
Seeing the structure of the original dataset and the new subsets
'data.frame': 13377 obs. of 40 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
'data.frame': 6688 obs. of 40 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
'data.frame': 6689 obs. of 40 variables:
$ ID : int 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 ...
$ FSN : num 52032 52033 52034 52035 52036 ...
$ Neem_Tree : num 1 1 1 1 1 1 1 1 1 0 ...
$ Neem_Tree_Distance : num 12 30 38 38 12 16 20 25 45 -1 ...
$ Neem_Tree_Size : num 2 2 2 2 2 2 2 2 2 -1 ...
$ Neem_Tree_Age : num 2 2 2 2 2 2 2 2 2 -1 ...
$ Neem_Tree_Usage : chr "188_" "188_" "188_" "188_" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 0 0 0 0 1 1 0 0 0 1 ...
$ Bamboo_Tree_Dist : num -1 -1 -1 -1 5 15 -1 -1 -1 18 ...
$ Banana_Tree : num 1 1 0 0 1 1 1 1 1 1 ...
$ Banana_Tree_Dist : num 1 20 -1 -1 1 16 5 2 1 1 ...
$ Rice_Field : num 0 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num -1 1 1 1 10 15 18 12 1 2 ...
$ Perm_Water_Body : num 0 0 0 0 1 1 1 1 1 1 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 2 1 8 12 26 17 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 1 1 1 1 1 1 ...
$ No_Mosquito_Net : num 1 1 1 0 1 4 4 5 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 155 155 153 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 0 0 1 0 ...
$ Roof_Material : num 161 156 156 156 54 158 158 158 161 158 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 162 162 162 164 165 165 165 164 165 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 0 0 0 0 1 1 1 1 1 1 ...
$ Granaries_in_HH : num 1 1 1 1 1 1 1 1 1 1 ...
$ Source_Drink_Water : num 92 167 167 92 167 167 167 167 167 167 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 174 174 174 177 174 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 1 1 1 0 0 ...
$ No_Of_Rooms : num 2 2 1 1 3 4 5 4 3 1 ...
$ No_Sleeping_Rooms : num 1 2 1 1 2 3 4 3 1 1 ...
In Q1_B we have 13377 obs
If we sum the observations in the subsets:
Top + bottom: 6688 + 6689 =13377
So, it is ok
Joining (Append)
Let’s join the two subsets (top and bottom)
'data.frame': 13377 obs. of 40 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
There are 13377 obs
It is the same of the original, So it is ok
Create a subset by coll
Seeing the names of the collums (name of the variables)
[1] "ID" "FSN" "Neem_Tree"
[4] "Neem_Tree_Distance" "Neem_Tree_Size" "Neem_Tree_Age"
[7] "Neem_Tree_Usage" "Neem_Tree_Use_Other" "Bamboo_Tree"
[10] "Bamboo_Tree_Dist" "Banana_Tree" "Banana_Tree_Dist"
[13] "Rice_Field" "Rice_Field_Dist" "Perm_Water_Body"
[16] "Perm_Wat_Body_Dist" "Wat_Body_Mid_Point" "No_Mosquito_Net"
[19] "Sprayed_2010" "Sprayed_2009" "Floor_Material"
[22] "Other_Floor_Mat" "Is_Floor_Damp" "Roof_Material"
[25] "Other_Roof_Material" "Wall_Material" "Other_Wall_Material"
[28] "Windows_in_Room" "Granaries_in_HH" "Source_Drink_Water"
[31] "Other_Src_Drink_Wat" "Toilet_Facility" "Other_Toilet_Fac"
[34] "Cooking_Fuel" "Other_Cooking_Fuel" "Source_Light"
[37] "Other_Source_Light" "Electricity_in_HH" "No_Of_Rooms"
[40] "No_Sleeping_Rooms"
Let’s divide the data in two parts taking as cut point the half of the columns
We have to be careful to keep an identification variable in each subset (in this case is FSN)
Left <- subset(Q1_B, select=c("ID", "FSN", "Neem_Tree", "Neem_Tree_Distance", "Neem_Tree_Size", "Neem_Tree_Age", "Neem_Tree_Usage", "Neem_Tree_Use_Other", "Bamboo_Tree", "Bamboo_Tree_Dist", "Banana_Tree", "Banana_Tree_Dist", "Rice_Field", "Rice_Field_Dist", "Perm_Water_Body", "Perm_Wat_Body_Dist", "Wat_Body_Mid_Point", "No_Mosquito_Net", "Sprayed_2010", "Sprayed_2009", "Floor_Material"))
str(Left)'data.frame': 13377 obs. of 21 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
Rigth <- subset(Q1_B, select=c("FSN", "Other_Floor_Mat", "Is_Floor_Damp", "Roof_Material", "Other_Roof_Material", "Wall_Material", "Other_Wall_Material", "Windows_in_Room", "Granaries_in_HH", "Source_Drink_Water", "Other_Src_Drink_Wat", "Toilet_Facility", "Other_Toilet_Fac", "Cooking_Fuel", "Other_Cooking_Fuel", "Source_Light", "Other_Source_Light", "Electricity_in_HH", "No_Of_Rooms", "No_Sleeping_Rooms"))
#Or select the opposite (With a "-" in front of the c):
#Rigth <- subset(Q1_B, select=-c("ID", "Neem_Tree", "Neem_Tree_Distance", "Neem_Tree_Size", "Neem_Tree_Age", "Neem_Tree_Usage", "Neem_Tree_Use_Other", "Bamboo_Tree", "Bamboo_Tree_Dist", "Banana_Tree", "Banana_Tree_Dist", "Rice_Field", "Rice_Field_Dist", "Perm_Water_Body", "Perm_Wat_Body_Dist", "Wat_Body_Mid_Point", "No_Mosquito_Net", "Sprayed_2010", "Sprayed_2009", "Floor_Material"))
str(Rigth)'data.frame': 13377 obs. of 20 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
Joining (Merge)
Let’s join the two subsets (left and right) according to the FSN
'data.frame': 13377 obs. of 40 variables:
$ FSN : num 45001 45002 45003 45004 45005 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Neem_Tree : num 0 0 0 0 0 0 0 0 0 0 ...
$ Neem_Tree_Distance : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Size : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Age : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Neem_Tree_Usage : chr "" "" "" "" ...
$ Neem_Tree_Use_Other: chr "" "" "" "" ...
$ Bamboo_Tree : num 1 1 1 1 1 1 1 1 1 1 ...
$ Bamboo_Tree_Dist : num 3 10 16 17 5 4 15 3 1 10 ...
$ Banana_Tree : num 0 0 0 0 0 0 0 1 1 0 ...
$ Banana_Tree_Dist : num -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
$ Rice_Field : num 1 1 1 1 1 1 1 1 1 1 ...
$ Rice_Field_Dist : num 4 12 16 13 4 13 10 10 7 12 ...
$ Perm_Water_Body : num 0 0 0 0 0 0 0 0 0 0 ...
$ Perm_Wat_Body_Dist : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wat_Body_Mid_Point : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ No_Mosquito_Net : num 0 2 1 0 0 0 0 0 1 0 ...
$ Sprayed_2010 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Sprayed_2009 : num 86 86 86 86 86 86 86 86 86 86 ...
$ Floor_Material : num 153 153 153 153 153 153 153 153 155 153 ...
$ Other_Floor_Mat : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Is_Floor_Damp : num 1 1 1 1 1 1 1 1 0 1 ...
$ Roof_Material : num 161 159 158 156 156 156 156 158 158 156 ...
$ Other_Roof_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Wall_Material : num 164 164 164 162 163 163 162 164 164 162 ...
$ Other_Wall_Material: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Windows_in_Room : num 1 1 1 0 0 0 0 1 1 0 ...
$ Granaries_in_HH : num 1 1 1 0 1 1 0 1 1 0 ...
$ Source_Drink_Water : num 167 167 92 92 92 92 92 92 92 92 ...
$ Other_Src_Drink_Wat: num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Toilet_Facility : num 177 177 177 177 177 177 177 177 177 177 ...
$ Other_Toilet_Fac : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Cooking_Fuel : num 180 180 180 180 180 180 180 180 180 180 ...
$ Other_Cooking_Fuel : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Source_Light : num 182 182 182 182 182 182 182 182 182 182 ...
$ Other_Source_Light : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Electricity_in_HH : num 0 0 0 0 0 0 0 0 0 0 ...
$ No_Of_Rooms : num 2 2 3 1 2 3 2 1 3 1 ...
$ No_Sleeping_Rooms : num 2 2 2 1 2 2 2 1 1 1 ...
Exercise 2
Let’s practice renaming variables and replacing values
'data.frame': 9029 obs. of 6 variables:
$ FSN : int 45001 45001 45002 45003 45004 45006 45009 45021 45021 45024 ...
$ Animal_Name : chr "Goats" "Poultry (chickens, ducks, pigeon)" "Buffaloes" "Cows/Oxen" ...
$ Animal_Count : int 3 1 1 1 2 2 7 3 1 3 ...
$ Animal_Distance : int 5 0 4 3 5 0 0 15 5 15 ...
$ Keep_Inside_House: int 1 1 0 1 1 1 1 0 1 1 ...
$ Days_Inside_House: int 120 365 -1 210 180 360 360 -1 150 90 ...
FSN Animal_Name Animal_Count Animal_Distance
1 45001 Goats 3 5
2 45001 Poultry (chickens, ducks, pigeon) 1 0
3 45002 Buffaloes 1 4
4 45003 Cows/Oxen 1 3
5 45004 Goats 2 5
6 45006 Cows/Oxen 2 0
Keep_Inside_House Days_Inside_House
1 1 120
2 1 365
3 0 -1
4 1 210
5 1 180
6 1 360
We observe:
There are 9029 obs
The animal name is quite large sometimes
The variable names are quite large
Renaming variables
Renaming variables labels: In collumns 2-5 to “anim”, “count”, “dist”, “indor”, “daysin”
'data.frame': 9029 obs. of 6 variables:
$ FSN : int 45001 45001 45002 45003 45004 45006 45009 45021 45021 45024 ...
$ anim : chr "Goats" "Poultry (chickens, ducks, pigeon)" "Buffaloes" "Cows/Oxen" ...
$ count : int 3 1 1 1 2 2 7 3 1 3 ...
$ dist : int 5 0 4 3 5 0 0 15 5 15 ...
$ indor : int 1 1 0 1 1 1 1 0 1 1 ...
$ daysin: int 120 365 -1 210 180 360 360 -1 150 90 ...
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15
8 10 4104 3288 885 451 135 83 23 15 2 9 1 3 1 1
16 20 30 35 50 60
3 1 1 1 2 2
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
24 3307 492 774 618 255 953 238 173 217 11 578 10 87 7 9
15 16 17 18 19 20 22 23 24 25 26 29 30 32 35 38
231 9 3 28 4 149 12 2 1 59 1 2 70 1 40 2
40 43 45 49 50 55 58 60 68 69 70 75 80 90 95 100
31 1 13 1 133 10 1 29 2 3 17 14 4 1 1 163
105 110 120 150 180 200 210 250 300 400 500 600 1000
3 7 3 60 2 77 2 4 19 9 40 9 3
0 1 89
4816 4200 13
-1 0 1 3 5 7 8 10 12 15 18 20 30 35 40 45
4823 1 2 1 4 2 2 8 1 1 1 3 48 2 2 7
50 60 65 70 80 90 100 110 120 122 130 150 160 165 175 180
2 84 1 1 5 192 23 2 273 1 2 74 13 5 2 466
190 200 210 215 220 230 240 245 250 260 265 270 280 300 310 340
1 57 11 1 1 4 82 1 10 7 3 3 33 36 1 1
350 355 356 360 365
1 1 1 21 2699
Replacing values in observations
Buffaloes Cows/Oxen
1592 3055
Dogs Goats
38 3816
Other Pigs
3 20
Poultry (chickens, ducks, pigeon)
505
Q1_B_1$anim[Q1_B_1$anim =="Buffaloes"] <- "Buff"
Q1_B_1$anim[Q1_B_1$anim =="Goats"] <- "Goat"
#But if we want to keep the 4 (x) first characters in a all rows
Q1_B_1$anim <- substr(Q1_B_1$anim, 0, 3)
table(Q1_B_1$anim)
Buf Cow Dog Goa Oth Pig Pou
1592 3055 38 3816 3 20 505
'data.frame': 9029 obs. of 6 variables:
$ FSN : int 45001 45001 45002 45003 45004 45006 45009 45021 45021 45024 ...
$ anim : chr "Goa" "Pou" "Buf" "Cow" ...
$ count : int 3 1 1 1 2 2 7 3 1 3 ...
$ dist : int 5 0 4 3 5 0 0 15 5 15 ...
$ indor : int 1 1 0 1 1 1 1 0 1 1 ...
$ daysin: int 120 365 -1 210 180 360 360 -1 150 90 ...
We are going to save this data because we are going to use it later